Enable RL training from base model #44

garyzhang99 · 2025-05-21T07:17:37Z

Description

The workflow don't support training from base model since we are using self.model.chat instead of self.model.generate; This PR should enable training from base model.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

yanxi-chen · 2025-05-21T12:46:28Z

I'd recommend deleting the newly defined tokenize_text(_async) methods, if they are not really used elsewhere and irrelevant to the goal of this PR as stated in the description.

pan-x-c · 2025-05-23T07:47:33Z

trinity/common/workflows/workflow.py

            messages.append({"role": "assistant", "content": self.reply_prefix})
        return messages

+    def format_prompt(self):


Add a new workflow named BaseModelWorkflow may be better.

pan-x-c · 2025-05-23T07:49:42Z

trinity/common/config.py

    # for unpaired preference dataset
    label_key: str = ""

+    use_base_format: bool = False


Using a new workflow type can do the same thing. Don't add a new field here.

garyzhang99 · 2025-05-28T06:19:48Z

The corresponding changes will the addressed in another PR. Close this PR for now.

add use base format

6e8ab38

garyzhang99 requested a review from pan-x-c May 21, 2025 07:17

remove added tokenzier to make the pr only have single change

5251a79

pan-x-c reviewed May 23, 2025

View reviewed changes

garyzhang99 closed this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable RL training from base model #44

Enable RL training from base model #44

Uh oh!

garyzhang99 commented May 21, 2025

Uh oh!

yanxi-chen commented May 21, 2025

Uh oh!

pan-x-c May 23, 2025

Uh oh!

pan-x-c May 23, 2025

Uh oh!

garyzhang99 commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable RL training from base model #44

Enable RL training from base model #44

Uh oh!

Conversation

garyzhang99 commented May 21, 2025

Description

Checklist

Uh oh!

yanxi-chen commented May 21, 2025

Uh oh!

pan-x-c May 23, 2025

Choose a reason for hiding this comment

Uh oh!

pan-x-c May 23, 2025

Choose a reason for hiding this comment

Uh oh!

garyzhang99 commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants